Univariate TS Models (ARIMA/SARIMA)

ACF & PACF Plots

Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots are crucial tools in time series analysis, helping to identify the type of model that best describes a series. The ACF plot shows the correlation of the series with its own lags, providing insights into the overall correlation structure and potential seasonality. On the other hand, the PACF plot reveals the direct effect of past values on the current value, helping to pinpoint the order of autoregressive models.

By examining the ACF and PACF plots, we can discern patterns that suggest the presence of autoregressive (AR) or moving average (MA) components in our time series models. Significant spikes in the ACF plot indicate potential AR terms, while significant spikes in the PACF plot suggest MA terms. These plots also assist in determining the stationarity of the series, a crucial aspect in time series modeling, where non-stationary data often require differencing to achieve stationarity.

In this section, we’ll explore the ACF and PACF plots for our datasets delving into their autocorrelation structures and deriving insights that will inform our model selection and forecasting approach.

The ACF plot for crude oil prices demonstrates prolonged significant autocorrelation, suggesting a non-stationary series. The gradual decline in correlation as lags increase indicates a potential long-term dependency or trend in the data.

The PACF plot shows significant spike at lag 1 and 2, followed by non-significant values

The ACF and PACF plots suggest considering an ARIMA model with ‘p’ to be 1 and 2. The slow decay in the ACF implies that differencing (d > 0) may be necessary to achieve stationarity.

The ACF plot for natural gas prices shows a very slow decay, suggesting non-stationarity and a need for differencing.

The PACF plot shows a significant spike at lag 1, followed by a drop-off.

The ACF and PACF plots suggest considering an ARIMA model with ‘p’ to be 1 and 2. The slow decay in the ACF implies that differencing (d > 0) may be necessary to achieve stationarity.

The ACF plot for electricity shows a strong positive autocorrelation across all the lags indicating a potential MA term and a need for differencing due to non-stationarity.

The PACF plot shows a significant correlations at lag 1. The choice of ‘p’ could be 1 or 2 based on the first significant spikes.

The consistent autocorrelation in the ACF plot suggests a potential need for a higher-order MA term or differencing, leading to an ARIMA(p,d,q) model consideration.

The ACF plot for GDP shows a persistent strong autocorrelation across all lags indicates non-stationarity, suggesting that differencing may be necessary.

The PACF plot have a sharp cutoff after lag 1 indicating an AR(1) process, suggesting that previous values have a significant impact on current GDP.

Given the strong autocorrelation and the PACF cutoff, an ARIMA(1,1,0) model may be a good starting point for modeling GDP, but differencing (d > 0) may be necessary.

The ACF plot for CPI shows a sustained high autocorrelation across lags suggests a non-stationary time series, indicative of CPI’s long memory.

The PACF plot have a sharp spike and a cutoff after lag 1 indicating an AR(1) process, suggesting that previous values have a significant impact on current GDP.

The sustained autocorrelation in the ACF plot implies that differencing might be needed. An initial ARIMA(1,1,0) model could be considered.

Detrend VS Difference

Detrending and differencing are two methods used to make time series data stationary, it involves removing the underlying trend from the data, while differencing focuses on the changes between consecutive observations.

Detrending typically subtracts the estimated trend component from the original series, while differencing transforms the series into the sequence of differences between adjacent values. While detrending addresses the trend, differencing can help eliminate both trend and seasonality, making the series stationary.

Detrended:

The residuals after detrending indicate that linear detrending might not fully account for all underlying components, as patterns still emerge.

First Difference:

The differenced series shows fluctuations around zero, indicative of improved stationarity. While the mean appears stabilized, examining autocorrelation in the differenced data is crucial to confirm stationarity fully.

Detrended:

The detrended series shows periods of volatility, indicating that removing the linear trend doesn’t capture all the data’s dynamics, remaining fluctuations may reflect other underlying components like seasonality

First Difference:

Consistent mean but varying volatility and a spike around 2020, which could be due to the pandemic

Detrended:

The detrended plot of electricity prices displays clear periodic fluctuations, suggesting the presence of seasonality.

First Difference:

The first differenced series oscillates around a central mean, which is indicative of stationarity in the mean of the series. However, the consistent pattern of spikes followed by a return to the mean—indicates a strong seasonal component.

Detrended:

The detrended GDP plot showcases that the residuals seem to have a non-linear component, as evidenced by the gradual decline and subsequent increase over time. The residuals decrease and then slowly begin to rise after the 1980s, accelerating significantly in recent years. GDP growth rate is not constant and a simple linear model may not be sufficient to capture the complexities.

First Difference:

The differenced series predominantly hovers around the zero line, which indicates that this transformation effectively removes the trend from the data, leading to a stationary series in terms of the mean. The substantial spike observed towards the end is likely due to the recent economic downturn due to the COVID-19 pandemic, which represents a significant economic shock not accounted for by typical GDP growth patterns.

Detrended:

The detrended CPI plot reveals residuals that decline over a prolonged period before stabilizing and then increasing. This pattern suggests that a simple linear trend does not fully capture the complexity of the inflationary trend over time.

First Difference:

The first difference plot for CPI demonstrates a series that fluctuates around a central mean value.

Original Vs First Difference

Time series data often embody intrinsic trends and seasonality, which can confound analyses if not properly addressed. Two techniques to transform such data into a stationary are detrending and differencing.

Detrending involves the removal of a trend line from the time series, thus flattening the data into a horizontal line around the mean.

Differencing, on the other hand, involves computing the difference between consecutive observations. This method is effective in eliminating both trend and seasonality, transforming the series into one where the mean level does not change over time. Differencing is a critical step in preparing data for ARIMA (AutoRegressive Integrated Moving Average) modeling.

Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_line()`).

Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_line()`).

Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_line()`).

Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_line()`).

Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_line()`).

Adjusted Dickey-Fuller Test

After applying various transformations to address the non-stationarity in our time series data, our next step is to conduct the Adjusted Dickey-Fuller Test on the differenced data. This test is crucial for confirming that the adjustments we’ve made have effectively rendered the series stationary. Stationarity is a key assumption for many time series forecasting methods, as it implies that the statistical properties of the series are consistent over time. By applying the Adjusted Dickey-Fuller Test to our transformed data, we aim to validate that the mean, variance, and autocorrelation structure of the series do not change over time.

Warning in adf.test(diff_crude_oil_ts, alternative = "stationary"): p-value
smaller than printed p-value

    Augmented Dickey-Fuller Test

data:  diff_crude_oil_ts
Dickey-Fuller = -9.3996, Lag order = 8, p-value = 0.01
alternative hypothesis: stationary

In the adjusted test we can clearly see the p-value is significantly less than 0.05, we reject the null hypothesis and conclude that the differenced series is stationary. The fact that the differenced series is stationary (but the original was not) suggests that the crude oil prices exhibit a trend or a form of non-stationarity that can be removed by differencing.

Warning in adf.test(diff_citygate_gas_ts, alternative = "stationary"): p-value
smaller than printed p-value

    Augmented Dickey-Fuller Test

data:  diff_citygate_gas_ts
Dickey-Fuller = -8.689, Lag order = 7, p-value = 0.01
alternative hypothesis: stationary

In the adjusted test we can clearly see the p-value is significantly less than 0.05, leading us to reject the null hypothesis, concluding that the differenced natural gas price series is stationary. This indicates that the series, once differenced, does not have a unit root, and its mean and variance are constant over time.

Warning in adf.test(diff_total_electricity_ts, alternative = "stationary"):
p-value smaller than printed p-value

    Augmented Dickey-Fuller Test

data:  diff_total_electricity_ts
Dickey-Fuller = -19.045, Lag order = 7, p-value = 0.01
alternative hypothesis: stationary

With the p-value now below 0.05, we can reject the null hypothesis, concluding that the differenced electricity price series is stationary. This means that after differencing, the series doesn’t exhibit a unit root, and its properties like mean and variance are consistent over time.

Warning in adf.test(diff_gdp_ts, alternative = "stationary"): p-value smaller
than printed p-value

    Augmented Dickey-Fuller Test

data:  diff_gdp_ts
Dickey-Fuller = -5.4304, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary

The p-value below 0.05 allows us to reject the null hypothesis, concluding that the differenced GDP series is stationary. This suggests that the original GDP series had a trend or other non-stationary components that were effectively removed by differencing.

Warning in adf.test(diff_cpi_ts, alternative = "stationary"): p-value smaller
than printed p-value

    Augmented Dickey-Fuller Test

data:  diff_cpi_ts
Dickey-Fuller = -5.5073, Lag order = 9, p-value = 0.01
alternative hypothesis: stationary

Since the p-value is now below 0.05, we can reject the null hypothesis, concluding that the differenced CPI series is stationary.

After implementing the Adjusted Dickey-Fuller Test on our differenced data sets, we observed a significant reduction in the p-values for all the data sets. This result strongly suggests that the differencing process has effectively induced stationarity in these series, as indicated by the absence of a unit root. However, to ensure the robustness of our findings and to precisely model and forecast these series, a subsequent examination of the ACF and PACF plots for the differenced data is imperative. These plots will provide further insights into the autocorrelation structure of the data, guiding us in the selection of appropriate ARIMA model parameters.

First vs Second Differencing

In time series analysis, differencing is a technique used to stabilize the mean of a series and make it stationary. When trends and seasonality are present in a time series, they can affect the predictive models. Differencing helps to mitigate these influences by focusing on the changes in the data rather than the actual values.

Differencing operates under the principle of transformation. It is designed to remove specific types of patterns:

  • First Differencing: This method subtracts the current observation from the previous one. It is a powerful tool to eliminate trends and some types of seasonality in the data, providing a clearer view of the underlying cyclical components and irregularities.

  • Second Differencing: When first differencing is not enough to achieve stationarity, or when the time series exhibits a more complex pattern such as a trend within a trend, second differencing can be employed. This involves applying the differencing operation twice, which can further simplify the predictive structure by reducing more complex serial correlations.

AIC & BIC

In time series analysis, choosing the right model is paramount for accurate forecasting. Two of the most critical metrics for model selection are the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). Both criteria are grounded in information theory and provide a means to balance model fit with model complexity.

  • AIC is a tool for model selection that quantifies the trade-offs between model complexity (the number of parameters in the model) and the goodness of fit. AIC rewards models that achieve a high goodness of fit but penalizes those that become overly complex. A lower AIC value often indicates a preferable model.

  • BIC extends the logic of AIC by incorporating sample size into the penalty for complexity. This adjustment makes BIC more stringent with complex models when dealing with larger datasets. As with AIC, a lower BIC suggests a better model.

   p d q      AIC      BIC     AICc
24 4 1 3 3090.304 3129.861 3090.609
   p d q      AIC      BIC     AICc
24 4 1 3 3090.304 3129.861 3090.609
   p d q      AIC      BIC     AICc
12 2 1 1 3095.071 3117.047 3095.172
   p d q      AIC      BIC     AICc
24 4 1 3 3090.304 3129.861 3090.609
   p d q      AIC     BIC     AICc
17 3 1 1 1031.985 1057.04 1032.162
   p d q      AIC     BIC     AICc
17 3 1 1 1031.985 1057.04 1032.162
  p d q     AIC      BIC     AICc
2 0 1 1 1037.54 1050.068 1037.591
   p d q      AIC     BIC     AICc
17 3 1 1 1031.985 1057.04 1032.162
   p d q      AIC       BIC      AICc
20 3 1 4 -412.675 -374.8158 -412.3046
   p d q      AIC       BIC      AICc
20 3 1 4 -412.675 -374.8158 -412.3046
   p d q       AIC       BIC      AICc
23 4 1 2 -412.1291 -378.4765 -411.8334
   p d q      AIC       BIC      AICc
20 3 1 4 -412.675 -374.8158 -412.3046
  p d q      AIC    BIC     AICc
4 1 1 1 4033.393 4048.3 4033.525
  p d q      AIC    BIC     AICc
4 1 1 1 4033.393 4048.3 4033.525
  p d q      AIC    BIC     AICc
4 1 1 1 4033.393 4048.3 4033.525
  p d q      AIC    BIC     AICc
4 1 1 1 4033.393 4048.3 4033.525
  p d q      AIC    BIC     AICc
4 1 1 1 4033.393 4048.3 4033.525
  p d q      AIC    BIC     AICc
4 1 1 1 4033.393 4048.3 4033.525
  p d q      AIC    BIC     AICc
4 1 1 1 4033.393 4048.3 4033.525
  p d q      AIC    BIC     AICc
4 1 1 1 4033.393 4048.3 4033.525

Fitting ARIMA

Moving average smoothing is a time series forecasting method that can help identify long-term trends by smoothing out short-term fluctuations. By averaging the data over specific time windows, these methods filter out the ‘noise’ and offer a cleaner view of the direction in which the series is moving.

Model Diagnostics

Moving average smoothing is a time series forecasting method that can help identify long-term trends by smoothing out short-term fluctuations. By averaging the data over specific time windows, these methods filter out the ‘noise’ and offer a cleaner view of the direction in which the series is moving.

Auto.Arima()

Moving average smoothing is a time series forecasting method that can help identify long-term trends by smoothing out short-term fluctuations. By averaging the data over specific time windows, these methods filter out the ‘noise’ and offer a cleaner view of the direction in which the series is moving.

Forecasting

Moving average smoothing is a time series forecasting method that can help identify long-term trends by smoothing out short-term fluctuations. By averaging the data over specific time windows, these methods filter out the ‘noise’ and offer a cleaner view of the direction in which the series is moving.

Comparing with Benchmark Methods

Moving average smoothing is a time series forecasting method that can help identify long-term trends by smoothing out short-term fluctuations. By averaging the data over specific time windows, these methods filter out the ‘noise’ and offer a cleaner view of the direction in which the series is moving.